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Abstract — Identifying the Hamiltonian of a quantum system 
from experimental data is considered. General limits on the 
identifiability of model parameters with limited experimental 
resources are investigated, and a specific Bayesian estimation 
procedure is proposed and evaluated for a model system where a- 
priori information about the Hamiltonian 's structure is available. 



I. Introduction 

At a fundamental level nature is governed by the laws of 
quantum mechanics, but until recently such phenomena were 
mostly a curiosity studied by physicists. However, significant 
advances in theory and technology are increasingly pushing 
quantum phenomena into the realm of engineering, as building 
blocks for novel technologies and applications from chemistry 
to computing. E.g., advances in laser technology enable ever 
more sophisticated coherent control of atoms, molecules and 
other quantum systems. Recent advances in nanofabrication 
have made it possible to create nanostructures such as quantum 
dots and quantum wells that behave like artificial atoms or 
molecules and exhibit complex quantum behaviour. Cold-atom 
systems and the creation of Bose condensates demonstrate that 
even macroscopic systems can exhibit quantum coherence. 

Harnessing the potential of quantum systems is a challeng- 
ing task, requiring exquisite control of quantum effects and 
system designs that are robust with regard to fabrication imper- 
fection, environmental noise and loss of coherence. Although 
significant progress has been made in designing effective 
controls, most control design is model-based, and available 
models for many systems do not fully capture their complexity. 
Model parameters are often at best approximately known 
and may vary, in particular for engineered systems subject 
to fabrication tolerances. Experimental system identification 
is therefore crucial for the success of quantum engineering. 
While there has been significant progress in quantum state 
identification and quantum process tomography, we require 
dynamic models if we wish to control a system's evolu- 
tion. Furthermore, effective protocols must take into account 
limitations on measurement and control resources for initial 
device characterization. This presents many challenges, from 
determing how much information can be obtained in a given 
setting to effective and efficient protocols to extract this in- 
formation. Here we illustrate some problems and solutions for 
the case of identifying the dynamics of a three-level system. 



II. Identifiability of Model Parameters 

One of the first questions to consider before attempting to 
find explicit protocols for experimental system identification is 
clearly what information we can hope to extract about a given 
system with a certain limited set of resources. For instance, 
given a system with a Hilbert space of dimension N, it is 
well known that the ability to prepare and measure the system 
in a set of computational basis states {|n) : n = 1 . . . , N} 
is insufficient for quantum process tomography, even if the 
process is unitary [1], [2]. However, recent work shows that a 
substantial amount of information about the generators of the 
dynamics can be obtained for Hamiltonian [3]-[6] and even 
dissipative systems [7]-[10] at least generically, by mapping 
the evolution of the computational basis states stroboscopically 
over time. More precisely, this is done by determining the 
probabilities that a measurement of the observable M = 
diag(mi, . . . , mjv) produces the outcome nig after the system 
was initialized in the computational basis state |fc) and allowed 
to evolve for time t for a number of different times t n . This 
begs the question how much information we can hope to 
obtain in general from such experiments. In this paper we 
consider Hamiltonian systems, whose evolution is governed 
by the Schrodinger equation ihU{t,t a ) = HU(t,t ) with a 
fixed Hamiltonian H and U(t ,t ) = I, for which we have 

Pke (t) = \{e\u(t,t )\k)\ 2 . 

Theorem 1: Let H and M be Hermitian operators repre- 
senting the Hamiltonian and the measurement, respectively, 
and let po be a positive operator with Tr(po) = 1 representing 
the initial state of the system. If M, H and p are simulta- 
neously blockdiagonalizable, i.e., there exists a decomposition 
of the Hilbert space H = ©f^Ws such that 

M = diag(M s ), H = diag(# s ), Po = diag(p s ), (1) 

where M s , H s and p s are operators on the Hilbert spaces H s , 
then we can at most identify H up to A S I S , where I s is 
the identity on the subspace H s . 

Proof: If H is block-diagonal then any initial state 
po starting in a subspace Ti s must remain in this sub- 
space. Thus, the dynamics on each subspace is independent, 
U(t) = ® s U s (t) with U s (t) = e - ltH °. Per hypothesis M 
and po are also blockdiagonal, so Ty:[MU(t)poU(ty] = 
J2 s Tv[M s Us{t)p s U s {tf}. If H = H + £ 5 A S I S then 



U(t) = ® s U s (t) 
Tr[MU(t)poU(t)t] : 



with U s (t) 



-it\. 



e' itK U s (t). Thus, 
U s (t)p s e^U s (t)^} 



= 'Es Tr [ M sU s {t)p s U s (t^] shows that H and H are indis- 
tinguishable. ■ 
Thus, there are some limitations on the maximum amount 
of information we can obtain about the system by initializing 
and measuring the system in a fixed computational basis. In 
particular, if H and M commute, we can infer that H and M 
are simultaneously diagonalizable, and assuming the eigenval- 
ues mi of M are distinct, this fixes the Hamiltonian basis, 
i.e., we have H — J2i A^II^, where Hg is the projector on the 
eigenspace of M corresponding to mg, i.e., the computational 
basis state \£). However, no information about the eigenvalues 
A m or the transition frequencies % = \g — \k can be obtained 
by measuring Pki(t), all of which are constant in this case. 

Maximum information about the Hamiltonian can be 
obtained if H and M are not simulataneously block- 
diagonalizable. This is the generic case, and in this case we 
can identify H at most up to a diagonal unitary matrix D = 
(l,e^ 2 , . . . ,e l0N ) and a global energy shift A I, i.e., H ~ 
H = HD+XqI, as was noted in [6]. The term Aol is gener- 
ally physically insignificant as it gives rise only a global phase 
factor U(t,0) = e -^ H+XaI '> = e - iXot e- itH e- iXot U(t,0), 
which is generally unobservable, as the abelian phase factors 
cancel, p(t) = U(t,0)p o U(t,0)^ = U(t,0)p o U(t,0)^ for any 
p . The diagonal unitary matrix D represents the freedom 
to redefine the measurement basis states, \n) i— ► e*^ n |n) as 
II„ = \n)(n\ = e l ^ n \n){n\e~ % ^ n . The phases 4> n cannot be 
ignored in general but in certain special cases they can be 
effectively eliminated. For example, if H is known to be real- 
symmetric, a common case in physics, then we can choose 
all basis vectors to be real and restrict e l( ^ n to ±1. Moreover, 
if the off-diagonal elements in the computational basis are 
known to be real and positive, Hki — (k\H\£) = \(k\H\£)\, 
then \(k\H\£} \ = \(k\H\£)\ with H as above. Hence, with this 
additional constraint the Hamiltonian is effectively uniquely 
determined (up to a global energy level shift and global 
inversion of the energy levels). 

A constructive procedure for reconstructing a generic un- 
known Hamiltonian from stroboscopic measurements of the 
observables Pke(t) at fixed times t = t n using Bayesian 
parameter estimation techniques was also given in [6]. 

III. Identification with a-priori information 

The previous section shows that when essentially no a- 
priori information about the Hamiltonian is available then even 
measurement of all the observables pki(t) is not sufficient to 
uniquely determine the Hamiltonian. However, in many cases 
some a-priori knowledge about the system is available. For 
instance, the transition frequencies = X v — A M of the 
system, where A„ are the eigenvalues of the Hamiltonian H, 
may be known from available spectroscopic data, and we may 
be able to infer basics such as the level structure and allowed 
transitions from fundamental physical principles. In such cases 
the identification problem can be substantially simplified and 
far less information may be required. 



As a specific simple example, consider a three-level system 
with known transition frequencies W12 and w 2 3 and no direct 
transitions between states |1) and |3) subject to external fields 
driving the (1,2) and (2,3) transitions, respectively. If our 
computational/measurement basis coincides with the eigenba- 
sis of the undriven system, then we know that the Hamiltonian 
of the driven system must be of the form H = H n + f(t)H\ 

[ d t " 

with H — diag(0, W12, W12 + W23) and H\ = d t d 2 

d 2 

i.e., we have only two unknowns, d\ and d%. If we take the 
field to be of the form f(t) — A\ cos(u>i2t) + A 2 cos(w23i), 
i.e., consisting of two components that resonantly drive the 
(1,2) and (2,3) transition, then transforming to a rotating 
frame and making the rotating wave approximation (RWA), 

[ !!i 

we obtain an effective Hamiltonian i? c g 



fii Q 2 
n 2 



where 



flk = dkAk/2h for k = 1,2. If the field amplitudes Ak 
are constant, this Hamiltonian is constant and we could use 
the general protocol in [6] to fully characterize the dynamics 
by stroboscopically measuring the probabilities Pkt{t) for 
k,£ = 1,2,3 at sufficiently many times t n . This requires 
the ability to initialize the system in all three basis states 
\k) and measure the populations of all three states. Due 
to conservation of probability J2ePke = 1 and symmetry 
Phi = Pik, the requirements can be reduced to initialization 
and measurement in two basis states, e.g., |1) and |3), as the 
remaining probabilities can be inferred from the other two, but 
we can do even better by using all the information available. 

We shall assume Hi and f2 2 are real and positive. For 
notational convenience, let fl = \/f2f + an d a = 
arctan(f2 2 /^i) be the polar coordinates of the vector 
(r2i, Q2), i- e -> ^1 = ^cosa and f2 2 = Slsina with ft € M.q 
and a e [0,tt/2]. Then U(t,0) = cxp(-zti7 cff ) is 

"c 2 cos (fii) + s 2 
— icsin(f^) 



-icsm(Qt) 
cos(fH) 



cs [cos (fit) — 1] 
—is sin(fH) 



cs[cos(fH) - 1] — iss'm(nt) s 2 cos(fii) + c 2 



(2) 



where c = cos a and s = sin a. This shows immediately 
that a single measurement trace pki(t) = \(£\U(t,0)\k)\ 2 
except P22 (t) contains information about both parameters and 
thus should be sufficient to fully identify the Hamiltonian. 
Specifically, if we choose to measure pu(t) we obtain 



Pn(t) = x 2 cos 2 (fH) 
= ^ cos(2m) 



■2x(l - 
-2x(l 



x) cos (fit) - 
- a;) cos(Qi) 



(1- 
(1 



2 ' 
2 



using cos 2 (fit) = |[cos(2f2t) + l] and setting x = c 2 = 1 — s 
This shows that there are three frequency components 0, ft and 
2fl, whose amplitudes determine a. 

IV. Efficient Parameter Estimation 

The form of pn(t) suggests Fourier analysis to determine 
the parameters ft and a, e.g., by identifying the non-zero 
Fourier components. The highest frequency peak will be 
at 20 and the corresponding peak amplitude a 2 = x 2 /2 
uniquely determines x = y/2a 2 . In some cases (as in the 



example shown in Fig. [TJ there may be only one clearly 
identifiable non-zero peak in the power spectrum, which could 
correspond to either f2 or 2il. This problem can in principle be 
overcome by estimating x from the average signal (pu(t)) = 
ao(x) = 1 — 2x + |x, from which we can obtain estimates 
for the coefficients a%(x) = \x 2 and a,\(x) = 2x(l — x). If 
02 ^> di then we identify the non-zero-frequency peak with 
20, otherwise with ft. 

Alternatively, we can estimate the base frequency it and the 
signal amplitudes using a Bayesian approach. The signal in 
our case is a linear combination of the basis functions g = 1, 
gi(t) — cos(f2i) and 32 (i) = cos(2f2i). Following standard 
techniques, we maximize the log-likelihood function [6], [11] 



P(u)\d) oc 



m b - N t 



log 



10 



1 - 



m b {h 2 



where mt is the number of basis functions, nib 
case, Nt is the number of data points, and 

- Nt — 1 -. m b — 1 

(d 2 > dl (h 2 } = — £ ' 

n=0 



(3) 



3 in our 



(4) 



m=0 



where the elements /i m of (raj,, 1) -vector h are projections 
of the (l,JVf)-data vector d onto a set of orthonormal basis 
vectors derived from the non-orthogonal basis functions g m (t) 
evaluated at the respective sample times t„ . Concretely, setting 
Gmn = gm(t n ), let A m and e m be the eigenvalues and 
corresponding (normalized) eigenvectors of the mb x m^ 
matrix GG^ with G = (G mn ), and let E — (e m / m ) be a 
matrix whose columns are e m . Then we have H — VG and 
h = _ffd^ with V = diag(o: m )-E^, and the corresponding 
coefficient vector is a = h'V. 

In our case the P(u>\d) is a function of a single frequency w 
and ft is the frequency for which P(w|d) achieves its global 
maximum. If a(f2) is the corresponding coefficient vector, we 
can obtain the best estimate for x = cos 2 a and thus a by 
minimizing ||a(x)— a(f2)|| with a m (x) as defined above. Thus, 
the problem of finding the most likely model (ft, a) is reduced 
to finding the global maximum of P(w|d). Unfortunately, this 
is not an easy task as P(w|d) is sharply peaked and can have 
many local extrema and a substantial noise floor depending 
on the number and accuracy of the data points. One way to 
circumvent this problem is to use the peaks in the discrete 
Fourier spectrum DFT(d) of the data d as input for a gradient- 
based optimization of P(u)\d). To make the peak detection 
simpler and more robust, especially when the data is noisy, we 
find the position wo of the highest peak in the rescaled power 
spectrum F{u) = 201og 10 [|DFT[d- (d)] | 2 + 1], which should 
correspond to either ft or 2ft, and then find the location of 
the maxima u>x and w 2 of P(w|d) in the intervals I\ = [w — 



Aw, wo + Aw] and I 2 = [^wq — Aa 



Aw], where Au 



depends on the resolution of the discrete Fourier transform, 
e.g., Aw w 2tt/T for regularly sampled data. We take the best 
estimate W3 for the system frequency ft to be uj\ if P\ > P%, 
and W2 otherwise, where Pj — P(oJj |d) for j = 1, 2. If Pi and 
P2 differ by less than a certain amount we can flag the system 
suggesting that more data is needed for reliable discrimination. 



To test this strategy 30 Hamiltonians H^l^^a^) with 
Q-k € [0, 2ir] and otk S [0, j] and a range of sampling time 
vectors t = (t n ) with t n G [0, 100] were generated with the 
number of samples Nt ranging from 2 10 to 2 5 . Regular and 
irregular time vector samplings were considered, where for 
irregular samples a (fast) non-uniform Fourier transform was 
used [17]. For each test system and time vector ti, noisy data 
vectors d were generated by simulating actual experiments, 
noting that in a laboratory experiment each data point d n 
would normally be estimated by initializing the system in state 
|1), letting it evolve for time t n , and performing a projective 
measurement Pi = |1)(1|, whose outcome is random, either 
or 1. To estimate the probability pn(t n ) the experiment 
is repeated many times and pn(t n ) approximated by the 
relative frequency d n of l's. The simplest approach is to 
use a fixed number of experiment repetitions iV e for each 
time t n , but noting that the uncertainty of the estimate d n of 
Pn(t n ) is N e shows that it is advantageous to adjust the 
number of repetitions N e for each time t n to achieve a more 
uniform signal-to-noise ratio. Specifically, for each data point 
we sample until d n \fW e w 10 or we reach a maximum number 
of repetitions (here 10 4 ). Although the projection noise for a 
single data point is Poissonian, the overall error distribution 
for a large number of samples is roughly Gaussian, justifying 
the use of a Gaussian error model in the Bayesian analysis. 

As the resolution of the discrete Fourier transform and 
hence the scaled power spectrum is approximately 2ir/T, and 
generally somewhat less for irregular sampling, the uncertainty 
in the peak positions of the power spectrum will generally be at 
least tt/T, limiting the accuracy of the frequency estimates, in 
our case to w 0.0314, regardless of the number of data points. 
This is evident in Fig. [TJ which shows that the peak in power 
spectrum is relatively broad, compared to the peak in the 
likelihood function. Furthermore, the frequency range covered 
by the power spectrum depends on the sampling frequency, 
or the number of data points Nt, with the largest discernible 
frequency approximately N t Tr/T. If the system frequency 
is outside this range covered by the power spectrum, we are 
unable to detect it. For example, for a system with = 4.0484, 
we require N t n/T > and thus N t > 128 data points 
(see Fig. [TJ. If T and Nt are sufficiently large to avoid such 
problems, the location wo of the global maximum of the power 
spectrum usually provides a good starting point for finding 
the global optimum of the log-likelihood function but we can 
generally substantially improve the frequency estimates using 
the likelihood. Of 14440 data sets analyzed (30 test systems 
sampled at different times) wo differed by less than 1% from 
the true system frequency il, or 2il, i.e., E(u>o) < 0.01 with 
E(u> ) = min{|wo-0|/0, |w -2f2|/2f2} in about half (732 1) 
the cases. For almost all failed cases the number of data points 
was too small and il outside the range of the power spectrum. 
Even when restricted to the successful cases as defined above, 
the median of E(ujq) was 0.0035, while the median of the 
relative error E 1(003) = \u>3 — f2|/0 of the final estimate W3 
obtained by maximizing the likelihood was 6.9 x 10~ 6 . 

We also considered finding the global maximum of the 



likelihood by other means, especially in those cases for which 
the power spectrum does not provide a useful initial frequency 
estimator. Since we have a function of a single parameter and 
evaluation of the likelihood, especially when the number of 
data points is small, is not expensive, it is possible to find the 
global maximum simply by exhaustive search. Interestingly, 
we found that log-likelihood still had a clearly identifiable 
global maximum in many cases even when the number of 
data points N t was far below the minimum number of sample 
points required to detect a peak in the power spectrum. E.g., 
for the system shown in Fig. [T[ the likelihood function still 
has a sharp peak around the system frequency even if the 
number of samples is reduced to 32, while the peak is no 
longer detectable in the power spectrum even for Nt = 128 
samples. However, as we reduce the number of samples 
additional peaks in the likelihood function tend to emerge 
at multiples or fractions of f2, as shown in the top inset of 
Fig. [T] This means that we can no longer unambiguously 
identify the true frequency f2. Such aliasing problems leading 
to sampling artefacts in the data analysis can be sustantially 
reduced by avoiding uniform sampling at equally spaced times 
(cf Fig. [T] top inset). In particular low-discrepancy sequences 
have been introduced with the aim to create a sampling 
with minimal regular patterns causing sampling artefacts, but 
also minimising the average gap between the samples for a 
fixed number of samples [15]. Here in particular we use a 
stratified sampling strategy, where a point is placed in each 
stratum of a regular grid according to a uniform probability 
distribution. This may be improved further using other low- 
discrepancy sequences [16]. The results are relevant as a 
significant reduction in the number of data points required 
reduces experimental overheads substantially. This comes at 
additional computational costs, as finding the global maximum 
of the likelihood function for irregular samplings with very 
few data points forms a hard optimization problem. Several 
standard optimization algorithms (simple pattern search and 
stochastic gradient decent) failed to reliably detect the global 
optimum, and exhaustive search had to be used. 

V. Concluding discussion 

We have considered Hamiltonian identification using stro- 
boscopic measurement data of a fixed observable. If the 
system can only be initialized in the measurement basis states 
then a completely unknown Hamiltonian cannot be uniquely 
identified even if we can measure the population of all basis 
states as a function of time. If a-priori information is available, 
however, complete identification of the system parameters is 
often possible with substantially reduced resources. We have 
illustrated this for the case of a three-level system where we 
can only monitor the population of state |1) over time, starting 
in |1), without the possibility of dynamic control or feedback 
as was considered in [12]. The results may be applicable 
to improve the efficiency of identification schemes for other 
systems. E.g., recent work on system identification for spin 
networks [13], [14] has shown that the relevant Hamiltonian 
parameters of a spin chain can be identified by mapping the 
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Fig. 1: Power spectra and log-likelihood for a test system 
with il = 4.0484 for data sampled at different times t in 
[0, 100]. For Nt > 128 the power spectra have a single peak 
in the plotted range, which is a reasonable estimate for fL 
For Nt — 64 and below, the main peak is outside the range 
of the power spectrum and the former no longer contains any 
useful information. Yet, the log-likelihood still has a clearly 
identifiable global maximum at f2 even for data vectors with 
as few as 32 data points, provided a non-uniform sampling is 
used. For uniform sampling with Nt — 32 the top inset shows 
that P(ui\d) has many peaks of approximately equal height 
due to aliasing effects (dashed black line). 



evolution of the first spin and Fourier analysis, but the scheme 
requires repeated quantum state tomography of the first spin 
for many times t n , which is experimentally expensive. 
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